The Bayesian Sorting Hat
نویسندگان
چکیده
Size-constrained clustering (SCC) refers to the dual problem of using observations to determine latent cluster structure while at the same time assigning observations to the unknown clusters subject to an analyst defined constraint on cluster sizes. While several approaches have been proposed, SCC remains a difficult problem due to the combinatorial dependency between observations introduced by the size-constraints. Here we reformulate SCC as a decision problem and introduce a novel loss function to capture various types of size constraints. As opposed to prior work, our approach is uniquely suited to situations in which size constraints reflect and external limitation or desire rather than an internal feature of the data generation process. To demonstrate our approach, we develop a Bayesian mixture model for clustering respondents using both simulated and real categorical survey data. Our motivation for the development of this decision theoretic approach to SCC was to determine optimal team assignments for a Harry Potter themed scavenger hunt based on categorical survey data from participants.
منابع مشابه
The Sorting Hat Goes to College
In the Harry Potter stories [R], each new year at the Hogwarts School for Witchcraft and Wizardry starts with the ceremonial assignment of the new first-year students to one of the four houses: Griffindor, Hufflepuff, Ravenclaw, and Slytherin. This is a milestone for the young students because their assigned houses greatly influence their future direction. The crucial assignment process is entr...
متن کاملFast Construction of a Word-Number Index for Large Data
The paper presents a work still in progress, but with promising results. We offer a new method of construction of word to number and number to word indices for very large corpus data (tens of billions of tokens), which is up to an order of magnitude faster than the current approach. We use HAT-trie for sorting the data and Daciuk’s algorithm for building a minimal deterministic finite state aut...
متن کاملA Bayesian Analysis of HAT-P-7b Using the EXONEST Algorithm
The study of exoplanets (planets orbiting other stars) is revolutionizing the way we view our universe. High-precision photometric data provided by the Kepler Space Telescope (Kepler) enables not only the detection of such planets, but also their characterization. This presents a unique opportunity to apply Bayesian methods to better characterize the multitude of previously confirmed exoplanets...
متن کاملNumerical and Experimental Investigation of the Effect of Different Orientation Angles on Crash Behavior of Composite Hat Shape Energy Absorber
Car body lightening and crashworthiness are two important objectives of car design. Due to their excellent performance, composite materials are extensively used in the car industries. In addition, reducing the weight of vehicle is effective in decreasing the fuel consumption. Hat shape energy absorber is used in car’s doors for side impact protection. The aim of these numerical models and expe...
متن کاملA Non-parametric Bayesian Framework for Spike Sorting Using Optimal Quantization
This paper describes an approach that performs spike sorting by a nonparametric density estimation technique under a Bayesian framework. The technique is based on an optimal quantization method. We performed experiments on simulated and real spike signals. The results are comparable with what is reported in the literature.
متن کامل